Phase 2: the eval set must never see pre-annotations
The single decision that most human-in-the-loop projects get wrong. If your eval labels were seeded by the model's own predictions, every F1 number you ever report is biased toward the model. The fix is cheap on day one, expensive on day forty.
evaluationhitlml ops